Quality Scheme Assessment in the Clustering Process
نویسندگان
چکیده
Clustering is mostly an unsupervised procedure and most of the clustering algorithms depend on assumptions and initial guesses in order to define the subgroups presented in a data set. As a consequence, in most applications the final clusters require some sort of evaluation. The evaluation procedure has to tackle difficult problems, which can be qualitatively expressed as: i. quality of clusters, ii. the degree with which a clustering scheme fits a specific data set, iii. the optimal number of clusters in a partitioning. In this paper we present a scheme for finding the optimal partitioning of a data set during the clustering process regardless of the clustering algorithm used. More specifically, we present an approach for evaluation of clustering schemes (partitions) so as to find the best number of clusters, which occurs in a specific data set. A clustering algorithm produces different partitions for different values of the input parameters. The proposed approach selects the best clustering scheme (i.e., the scheme with the most compact and well-separated clusters), according to a quality index we define. We verified our approach using two popular clustering algorithms on synthetic and real data sets in order to evaluate its reliability. Moreover, we study the influence of different clustering parameters to the proposed quality index.
منابع مشابه
Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملWater Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis
Zoning the pollution of a river may be the first or even the most important step in water quality management. In order to resolve its pollution, fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates mutiple parameters
 
In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy cluster...
متن کاملWater Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis
Zoning the pollution of a river may be the first or even the most important step in water quality management. In order to resolve its pollution, fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates mutiple parameters In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy clustering ...
متن کاملApplication of a Self-Organizing Map for Clustering the Groundwater Quality in Kerman Province and Assessment its Suitability for Drinking and Irrigation Purposes
Evaluation of groundwater hydro chemical characteristics is necessary for planning and water resources management in terms of quality. In the present study, a self-organizing map (SOM) clustering technique was used to recognize the homogeneous clusters of hydro chemical parameters in water resources (including well, spring and qanat) of Kerman province; then, the quality classification of groun...
متن کاملRegulation of Electrical Distribution Companies via Efficiency Assessments and Reward-Penalty Scheme
Improving performance of electrical distribution companies, as the natural monopoly entities in electric industry, has always been one of the main concerns of the regulators. In this paper, a new incentive regulatory scheme is proposed to improve the performances of electrical distribution companies. The proposed scheme utilizes several efficiency assessments and a 3-dimentional reward-penalty ...
متن کامل